Representations and metrics

Question 1

Screenshot taken from Coursera

Answer

https://www.coursera.org/learn/ml-clustering-and-retrieval/discussions/weeks/2/threads/yoIkdz9XEeaQYwrcKTQWAQ
The coordinate differences get squared before scaling, so the amount of variations gets squared too



In [13]:

    
def calculate_weight(feature):
    weight = (1/(max(feature) - min(feature))) ** 2
    return weight

price = calculate_weight(np.array([500000, 350000, 600000, 400000],  dtype=float))
room = calculate_weight(np.array([3, 2, 4, 2],  dtype=float))
lot = calculate_weight(np.array([1840, 1600, 2000, 1900],  dtype=float))

print price
print room
print lot









    



1.6e-11
0.25
6.25e-06

Question 2

Screenshot taken from Coursera

Answer

Word counts
- Sentence 1: [2, 1, 1, 1, 1, 1, 1, 1, 0]
- Sentence 2: [0, 2, 1, 1, 0, 1, 0, 1, 2]
Euclidean distance:



In [16]:

    
import numpy as np

s1 = np.array([2, 1, 1, 1, 1, 1, 1, 1, 0, 0], dtype=float)
s2 = np.array([0, 2, 1, 1, 0, 0, 0, 1, 2, 1], dtype=float)
print s1
print s2

euclidean_distance = np.sqrt(np.sum((s1 - s2)**2))
euclidean_distance









    



[ 2.  1.  1.  1.  1.  1.  1.  1.  0.  0.]
[ 0.  2.  1.  1.  0.  0.  0.  1.  2.  1.]






    Out[16]:





3.6055512754639891

Question 3

Screenshot taken from Coursera



In [17]:

    
import numpy as np

s1 = np.array([2, 1, 1, 1, 1, 1, 1, 1, 0, 0], dtype=float)
s2 = np.array([0, 2, 1, 1, 0, 0, 0, 1, 2, 1], dtype=float)
print s1
print s2
cosine_similarity = np.dot(s1, s2)/(np.sqrt(np.sum(s1**2)) * np.sqrt(np.sum(s2**2)))
cosine_distance = 1 - cosine_similarity
cosine_distance









    



[ 2.  1.  1.  1.  1.  1.  1.  1.  0.  0.]
[ 0.  2.  1.  1.  0.  0.  0.  1.  2.  1.]






    Out[17]:





0.5648058601107554

Question 4

Screenshot taken from Coursera

Question 5

Screenshot taken from Coursera

Answer

Given the number of documents, $tf*idf$ = 0, so idf = 0

$$idf = \large log \frac{\text{# docs}}{\text{1 + # docs}} = 0$$$$\large \frac{\text{# docs}}{\text{1 + # docs}} = e^0 = 1$$

Question 6

Screenshot taken from Coursera